lora branch
Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models
Continual learning (CL), which requires the model to learn multiple tasks sequentially, is crucial for large language models (LLMs). Recently, low-rank adaptation (LoRA), one of the most representative parameter-efficient fine-tuning (PEFT) methods, has gained increasing attention in CL of LLMs. However, most existing CL methods based on LoRA typically expand a new LoRA branch to learn each new task and force the new and old LoRA branches to influence old tasks equally, potentially leading to forgetting. In this work, we propose a new method, called gated integration of low-rank adaptation (GainLoRA), for CL of LLMs. GainLoRA expands a new LoRA branch for each new task and introduces gating modules to integrate the new and old LoRA branches. Furthermore, GainLoRA leverages the new gating module to minimize the influence from the new LoRA branch to old tasks, effectively mitigating forgetting and improving the model's overall performance. Experimental results on CL benchmarks demonstrate that GainLoRA outperforms existing state-of-the-art methods.
Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models
Liang, Yan-Shuo, Chen, Jia-Rui, Li, Wu-Jun
Continual learning (CL), which requires the model to learn multiple tasks sequentially, is crucial for large language models (LLMs). Recently, low-rank adaptation~(LoRA), one of the most representative parameter-efficient fine-tuning (PEFT) methods, has gained increasing attention in CL of LLMs. However, most existing CL methods based on LoRA typically expand a new LoRA branch to learn each new task and force the new and old LoRA branches to influence old tasks equally, potentially leading to forgetting. In this work, we propose a new method, called gated integration of low-rank adaptation (GainLoRA), for CL of LLMs. GainLoRA expands a new LoRA branch for each new task and introduces gating modules to integrate the new and old LoRA branches. Furthermore, GainLoRA leverages the new gating module to minimize the influence from the new LoRA branch to old tasks, effectively mitigating forgetting and improving the model's overall performance. Experimental results on CL benchmarks demonstrate that GainLoRA outperforms existing state-of-the-art methods.
Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution
Zhang, Zhicheng, Lv, Peizhuo, Wan, Mengke, Fang, Jiang, Guo, Diandian, Chen, Yezeng, Liu, Yinlong, Ma, Wei, Sun, Jiyan, Geng, Liru
Recently, Deep Learning (DL) models have been increasingly deployed on end-user devices as On-Device AI, offering improved efficiency and privacy. However, this deployment trend poses more serious Intellectual Property (IP) risks, as models are distributed on numerous local devices, making them vulnerable to theft and redistribution. Most existing ownership protection solutions (e.g., backdoor-based watermarking) are designed for cloud-based AI-as-a-Service (AIaaS) and are not directly applicable to large-scale distribution scenarios, where each user-specific model instance must carry a unique watermark. These methods typically embed a fixed watermark, and modifying the embedded watermark requires retraining the model. To address these challenges, we propose Hot-Swap MarkBoard, an efficient watermarking method. It encodes user-specific $n$-bit binary signatures by independently embedding multiple watermarks into a multi-branch Low-Rank Adaptation (LoRA) module, enabling efficient watermark customization without retraining through branch swapping. A parameter obfuscation mechanism further entangles the watermark weights with those of the base model, preventing removal without degrading model performance. The method supports black-box verification and is compatible with various model architectures and DL tasks, including classification, image generation, and text generation. Extensive experiments across three types of tasks and six backbone models demonstrate our method's superior efficiency and adaptability compared to existing approaches, achieving 100\% verification accuracy.
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
Wu, Taiqiang, Ding, Chenchen, Zhou, Wenyong, Cheng, Yuxin, Feng, Xincheng, Wang, Shuqi, Shi, Chufan, Liu, Zhengwu, Wong, Ngai
--Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). T o address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels. Large language models (LLMs), such as GPT -4 [9], LLaMA [6], and Qwen [10], have demonstrated promising performance in various Natural Language Processing (NLP) tasks. However, this success, primarily driven by massive model parameters, brings forth two critical challenges in practical applications. First, adapting LLMs to downstream tasks via full model fine-tuning requires prohibitive computational resources.
AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework
Xu, Xilie, Zhang, Jingfeng, Kankanhalli, Mohan
With the emergence of foundation models (Bommasani et al., 2021), fine-tuning the pre-trained feature extractor (FE) has become a low-cost strategy to obtain superior performance in downstream tasks. Notably, GPT-3 (Brown et al., 2020) can achieve state-of-the-art (SOTA) performance on GLUE benchmarks (Wang et al., 2018) via parameterefficient fine-tuning (Hu et al., 2021). Due to the ubiquitous existence of adversarial attacks (Goodfellow et al., 2014; Madry et al., 2018), adopting pre-trained FEs to safety-critical downstream areas such as medicine (Buch et al., 2018) and autonomous cars (Kurakin et al., 2018) necessitates the strategy of robust fine-tuning (Hendrycks et al., 2019) that can yield adversarial robustness in downstream applications. Robust fine-tuning (RFT) (Hendrycks et al., 2019) that contains an adversarial objective to learn features of adversarial data (Madry et al., 2018) can gain adversarial robustness in downstream tasks. To further improve generalization, vanilla RFT (formulated in Eq. 1, shown in the left panel of Figure 1c) optimizes both adversarial and natural objectives to learn the features of adversarial and natural data simultaneously via the FE (Zhang et al., 2019; Shafahi et al., 2019; Jiang et al., 2020).